Re: Seg

der Mouse (mouse@Collatz.McRCIM.McGill.EDU)
Sat, 8 Oct 1994 13:15:34 -0400
Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Jonathan M. Bresler: "Re: syslog idea"
Previous message: Doug Hughes: "Re: Security Info (root broken)"
Maybe in reply to: Michael Bresnahan: "Seg"
> To: Fault@winternet.com, Summary@winternet.com, bugtraq@crimelab.com
> Subject: Seg

Methinks perhaps you forgot the quotes around the subject :-)

> For general background, a segmentation fault occurs when a
> "unprivaledged" process accesses a memory address which is not in its
> address space or trys to write to memory which has been marked
> read-only.

Well, mostly.  A segmentation fault in the sense of something that
generates a SIGSEGV is only a subset of those; if a process tries to
access nonexistent virtual memory just off the end of its stack, the
kernel will normally just grow the stack instead.  And when
copy-on-write memory is created (eg, via fork(), or mmap()), the memory
is normally set up as read-only, and a write access causes the kernel
to transparently create a read/write copy of the page and let the
process write to that.

Also, you imply that it is possible for a process to be privileged in a
way that allows it to make such memory accesses.  This is not the case
in any system I have ever heard of.

> My question asked how such a scheme was implemented.  Specifically,
> it asked if hardware support was needed to implement such a scheme.
> I asked the question, because I did not understand how the kernel,
> being just another process (not hardware), could enforce memory
> restriction on another process, when atthe time the kernel is not
> even executing.

The kernel is not really "just another process".  "Process" is a
software notion, created and maintained by the kernel.  The kernel is
special because it (usually) executes in supervisor mode (see my next
paragraph).

> The answer is, yes, hardware support is required.

> The cpu has what is called a MMU (Memory Management Uniy).

There is another critical notion: that of user mode and supervisor mode
(also called such things as kernel mode[%]).  On a modern machine (as
in, one capable of running a multiuser system like NetBSD), there is a
mode bit that grants or denies certain privileges, typically the
ability to execute certain instructions.  On the 68020, for example,
the MOVES and MOVEC instructions work only when they are executed in
supervisor mode.

[%] There is at least one machine - the VAX - where there are four
    modes (kernel, executive, supervisor, and user), all different.
    For our purposes, and for all UNIX derivatives I know of, two modes
    are all that are used.  I'll continue to speak of the privileged
    mode as "supervisor" and the nonprivileged mode as "user"; on the
    VAX, what I am calling "supervisor mode" is what the hardware docs
    speak of as "kernel mode".

> This unit keeps track (in its own private memory?)

Yes, essentially.  It usually is not organized in the typical
von-Neumann address space way, but it nevertheless is essentially a
small amount of memory that's private to the MMU hardware.

> of virtual memory addressing, memory ownership, and maybe a few other
> things I'm not aware of.

The MMU normally keeps track of virtual-to-physical mappings, which are
really just lookup tables, and protection.  Every memory access from
user mode must pass the protection checks set up in the MMU (or the
access is denied and an exception occurs) and then the lookup table
turns it into a physical address.  (To reduce the size of these lookup
tables, they normally work on just the high N bits of the address,
where N varies from one machine to another, sometimes from one variant
to another, with the low bits passed through unchanged.)

On many machines, the MMU also affects supervisor mode accesses as
well, and there are more protection bits, to permit setting up memory
that user-mode can access, that supervisor-mode can access but
user-mode can't, or that not even supervisor-mode can access.  (This is
a simplification, since there are at least three common types of
access[%].  The last kind may sound useless, but it's valuable as an
aid to catching bugs in the kernel.)

[%] Read (read as data), write (write as data), and execute (read for
    instruction execution); some hardware makes no distinction between
    read and execute.

> The kernel process is given special privaleges by the CPU/MMU to read
> and write to these memory tables.

This is usually done by ensuring that the hardware-provided mechanisms
to read and write the MMU setup are accessible to supervisor mode but
not user mode.

> It assigns an address space to a process.  When that process attempts
> to access memory it isn't supposed to, the MMU interupts the process,
> swaps it out to someplace, executes the kernel code (that was
> previously setup to be by the kernel) to handle the page fault.

Right in outline, wrong in some details.  The MMU just notices the
attempted access and causes the CPU to take a memory addressing trap.
The MMU does not "swap [] out" the user process; if that is done, it's
done by the kernel.  The trap amounts to little more than "save the CPU
state somewhere", "set mode to supervisor", and "jump to the memory
management exception handler".  (The hardware locates the address for
this jump typically by looking in a table set up by the kernel at boot
time, a table specified in a way that can be used only by supervisor
mode.)

> This normally results in the kernel sending a SIGSEGV to the
> "malicious process".

Yes, since you started with a proccess attempting to "access memory it
isn't supposed to".  But the same mechanisms are used to provide
virtual memory, that is, more memory than actually exists on the
system.  When this is being done, the MMU is set up so that the memory
that exists only virtually (that is, not in physical memory but
actually out on disk somewhere) is no-access in the MMU.  Then when the
user process tries to access it and takes the MMU exception trap, the
kernel notices that the access was to a valid address and fetches the
required page off disk into some physical memory page, fiddles the MMU
to point to that physical page and resets the protections to what they
should be, and lets the access happen.

> The exploit script posted with the assembly source was a way of
> subverting this mechanism in a way I still don't fully understand.  I
> more detailed understanding of the above process specific to the
> machine and kernel where the bug exists/existed is necessary.

Right.

> With my current understanding and assuming there is no bug in the
> hardware itself, I don't understand how the exploit script is able to
> overwrite any kernel memory.  If at the point where it trys to
> overflow some buffer (where is this buffer? In the hardware?)  it is
> stopped by the MMU, how then does it actually get to overwrite kernel
> memory?

I'll explain below.

> Does it actually get to access the memory, THEN the kernel is told
> that a fault has a occured, and then due to the bug the kernel
> doesn't clean things up properly?

A reasonable guess, but no, that's not what's going on.

> I guess a start might be to know what the asm instructions "restore"
> and "save" do.

The SPARC architecture involves something called "register windows".
There are at any moment 32 registers accessible, divided into four
groups of eight: the "ins" %i0-%i7, the "outs" %o0-%o7, the "locals"
%l0-%l7, and the "globals" %g0-%g7.

Conceptually, there is an infinite chain of "windows" of registers,
with the ins of each window being the same as the outs of the adjacent
window (the globals are common to all windows):

    +-------- window #N --------+           +------- window #N+2 -------+
   /                             \         /                             \
    %i0...%i7 %l0...%l7 %o0...%o7           %i0...%i7 %l0...%l7 %o0...%o7
... %o0...%o7           %i0...%i7 %l0...%l7 %o0...%o7           %i0...%i7 ...
             /         \                             /         \
... --------+           +------- window #N+1 -------+           +-------- ...

At any given time, exactly one window is current.  All register
accesses use the current window to determine which physical register is
accessed for a given register number.  A "save" instruction shifts from
using window N to using window N+1; a "restore" shifts the other way.
Normally, a procedure call puts values in %o0 through %o7; the "save"
makes the calling stack frame's registers inaccessible, and at the same
time shifts those values to %i0 through %i7 (or more precisely, it
shifts names so that the name %i0 refers to the same register that the
name %o0 used to refer to, and similarly for the other seven).  The
called procedure then has its own %l0...%l7 and %o0...%o7 to use as it
pleases; when it's done, a "restore" shifts back to the caller's
window, with %i0...%i7 being shifted back to %o0...%o7.

However, of course the hardware doesn't have an infinite set of
registers.  It actually has enough registers for some small number of
windows, typically 8, a register (which I'm not sure is even
accessible) saying which of those is the current window, and a
(privileged) register saying which of those windows are accessible.  If
a user process does a save or restore that would cause it to shift into
an inaccessible window, a trap is taken in a way similar to the MMU
traps I described above: the CPU shifts to supervisor mode and jumps to
the handler set up by the kernel.  It is the responsibility of this
handler to arrange for the desired window to be accessible.  In the
case of a save instruction (a window overflow trap), this means dumping
one of the full windows to the stack; for a restore instruction (a
window underflow trap), it means reading the desired window off the
stack into the registers.  (It would be possible for the trap handler
to write or read more than one window at a time, but this normally
isn't done because studies have shown it to almost always be
counterproductive.)  In support of this, two of the registers (%i6 and
%o6) are reserved by software convention to keep track of the stack;
they are used by the trap handlers to determine where to store/load
windows to/from.

Now the stage is set, and I can explain the bug the posted code is
trying to take advantage of.

The window underflow and overflow trap handlers run in supervisor mode,
as they must since they have to read and write the %wim (window invalid
mask) register.  The bug is that they don't check that the user process
has access to the place they need to read a window from (for the
underflow trap) or write a window to (for the overflow trap).  If the
user process has trashed %sp or %fp (whichever one the handler uses),
this code will attempt to access somewhere illegal.  The posted code
simply makes the relevant register point to some memory that supervisor
mode can read/write but user mode can't, then takes a window overflow
or underflow trap (depending on whether it wants to read or write).  If
the register pointed to memory that not even supervisor mode could
access, the window trap code would get an invalid access and it would
either kill the user process or panic the system, depending on how
carefully the MMU trap handler checks things.  But as it is, the window
save/restore code doesn't notice that while it, running in supervisor
mode, can access that memory, user mode can't, so it happily reads or
writes it and then returns control to user mode the way it normally
does; the user-mode code then cleans up the %sp/%fp registers so the
stack is back where it belongs.

Of course, the trap handler _should_ check the memory protection before
reading or writing, and in non-buggy systems it does.

If anyone is interested, I'll be glad to correspond further on any
points which still remain unclear.

					der Mouse

			    mouse@collatz.mcrcim.mcgill.edu
Next message: Jonathan M. Bresler: "Re: syslog idea"
Previous message: Doug Hughes: "Re: Security Info (root broken)"
Maybe in reply to: Michael Bresnahan: "Seg"